Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarify allocation explain if random shard chosen #75670

Conversation

DaveCTurner
Copy link
Contributor

Today we often encounter users that are confused by the behaviour of
calling GET _cluster/allocation/explain without a body: it seems to
work, but it explains a random shard, and if this isn't the shard
they're thinking of then it's unclear how to proceed.

With this commit we add a note to the response when a shard was randomly
chosen indicating that it is possible, and possibly useful, to explain a
different shard. We also adjust the exception message in the case when
all shards are assigned to indicate why it's an invalid request and what
to do to make it valid.

Today we often encounter users that are confused by the behaviour of
calling `GET _cluster/allocation/explain` without a body: it _seems_ to
work, but it explains a random shard, and if this isn't the shard
they're thinking of then it's unclear how to proceed.

With this commit we add a note to the response when a shard was randomly
chosen indicating that it is possible, and possibly useful, to explain a
different shard. We also adjust the exception message in the case when
all shards are assigned to indicate why it's an invalid request and what
to do to make it valid.
@DaveCTurner DaveCTurner added >enhancement :Distributed Coordination/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) v8.0.0 v7.15.0 labels Jul 26, 2021
@elasticmachine elasticmachine added the Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. label Jul 26, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

@DaveCTurner DaveCTurner requested a review from repantis July 26, 2021 08:13
@sethmlarson sethmlarson added the Team:Clients Meta label for clients team label Jul 26, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/clients-team (Team:Clients)

Copy link
Contributor

@sethmlarson sethmlarson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The spec changes LGTM 👍

@tlrx
Copy link
Member

tlrx commented Aug 2, 2021

In my opinion we should always require a specific shard (unless there is only 1 unassigned shard)

@DaveCTurner
Copy link
Contributor Author

I know what you mean, but my experience has been that in many cases there are N (>1) unassigned shards, but they're all unassigned all for the same reason so explaining any of them is actually useful. There are other alternatives, we could try harder to make sure we're explaining a useful target (e.g. prefer primaries over replicas) or explain more of them in the diagnostics tooling, but simply making it an error not to specify a shard in this API would mean we're guaranteed to need another round-trip to the user which seems bad.

Copy link

@repantis repantis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @DaveCTurner, this looks good to me. Apologies for the delay in reviewing. I left some minor comments, mostly for my own education. Thank you again for making the change.

@@ -25,6 +25,8 @@ GET _cluster/allocation/explain

`GET _cluster/allocation/explain`

`POST _cluster/allocation/explain`
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. I assume that POST should have been documented already but we had missed it, but please correct me if that is not the case. If that is the case, should we also be including examples for POST, similar to the ones we have for GET, or a note that they are equivalent?

Copy link
Contributor Author

@DaveCTurner DaveCTurner Aug 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The convention is that all GET APIs that take a body should also work the same with POST, see e.g. https://www.elastic.co/guide/en/elasticsearch/reference/current/api-conventions.html#api-conventions

I've added this here to emphasise that you can send a request with a body - some tooling makes it hard to execute a GET with a body, you have to switch it into POST model

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, @DaveCTurner thanks for clarifying. Good to know there is general guidance in the documentation already.

@@ -53,6 +66,11 @@ public ClusterAllocationExplanation(ShardRouting shardRouting, @Nullable Discove
}

public ClusterAllocationExplanation(StreamInput in) throws IOException {
if (in.getVersion().onOrAfter(Version.V_8_0_0)) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume BwC stands for backwards compatibility, is that correct? (I've seen it several times already, but I figured I should finally confirm).

I assume that adding an extra field in the API response is not a breaking change, is that correct?

I am also assuming we are being extra cautious and not introducing it in 7.x, but we could have technically done so, had we wanted to, is that correct?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's correct, BwC is short for backwards compatibility.

Adding an extra field to the response is indeed not considered a breaking change, we expect clients to skip fields they don't recognise.

The labels on the PR indicate how this will be backported, so yes this will go into 7.x. However we have to start out with the change being 8.0-only because we need it to pass all the mixed-version tests in CI first. We'll then backport it and (simultaneously) adjust the version on this line to match. It's quite a complex dance.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, @DaveCTurner, thanks a lot for the context.

@DaveCTurner DaveCTurner merged commit 95edc6d into elastic:master Aug 2, 2021
@DaveCTurner DaveCTurner deleted the 2021-07-26-allocation-explain-messages-for-random-choice branch August 2, 2021 14:14
DaveCTurner added a commit that referenced this pull request Aug 2, 2021
Today we often encounter users that are confused by the behaviour of
calling `GET _cluster/allocation/explain` without a body: it _seems_ to
work, but it explains a random shard, and if this isn't the shard
they're thinking of then it's unclear how to proceed.

With this commit we add a note to the response when a shard was randomly
chosen indicating that it is possible, and possibly useful, to explain a
different shard. We also adjust the exception message in the case when
all shards are assigned to indicate why it's an invalid request and what
to do to make it valid.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Coordination/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) >enhancement Team:Clients Meta label for clients team Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. v7.15.0 v8.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants